Abstract 



The mean time required by a transcription factor (TF) or an en- 
zyme to find a target in the nucleus is of prime importance for the 
initiahzation of transcription, gene activation or the start of DNA re- 
pair. We obtain new estimates for the mean search time when the TF 
or enzyme, confined to the cell nucleus, can switch from a one dimen- 
sional motion along the DNA and a free Brownian regime inside the 
crowded nucleus. We give analytical expressions for the mean time 
the particle stays bound to the DNA, tdna, and the mean time it 
diffuses freely, Tfree- Contrary to previous results but in agreement 
with experimental data, we find a factor T£,f^A ~ 3. 7rjj.ee for the Lac-I 
TF. The formula obtained for the time required to bind to a target 
site is found to be coherent with observed data. We also conclude that 
a higher DNA density leads to a more efficient search process. 
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Introduction The search process for a target promoter sequence by a tran- 
scription factor(TF) or for a double strand break in the DNA by an enzyme 
such as Rac-A are fundamental processes of cell activity and survival. In 
the first case, the search process controls gene expression, while in the sec- 
ond, it precedes DNA repair. In both cases timing is crucial as, for example, 
unrepaired breaks are an obstacle for normal cell function and can lead to 
mutations or apoptosis [6]. 

The analysis of the mean time required for a TF to bind with a promoter 
site originates from the early work of Berg- Von Hippel [3l [H [2]. They pro- 
posed a new but now well accepted scenario to resolve the apparent paradox 
that this time was, as experimentally observed, much faster than what it 
would be if only free diffusion was involved. In this scenario, the TF can be 
trapped by an unspecific potential energy and slide along the DNA molecule. 
It then either finds its final target or detaches through thermal noise and 
diffuses freely until it binds to another portion of the DNA. This process 
iterates until the final site is reached. Recent experiments have studied the 
kinetics of binding and unbinding to the DNA using single particle tracking. 
In the case of Lac-I, the time spent bound to the DNA represents about 87% 
[H] of the total search time. 

The process of sliding along the DNA can be modeled as a sequence of 
jumps between local potential wells resulting from the interaction with the 
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base pairs (bp). Approximating this motion by a reduced one dimensional 
Brownian motion leads to large variance of the diffusion coefficient [H], and 
today this approximation is thus understood as a drastic simplification. A 
refined analysis was developed in |4j, for the mean number of bp scanned 
during each binding to the DNA and time required to find the target site by 
using some results on the motion in random environments [T5] . 

In this letter, we propose to revisit the computations of the search time 
Ts- We begin with the Berg-von-Hippel model [Ij. In our model, a TF 
is confined in the nucleus, which contains a set of DNA molecules. The 
interaction between the TF and the DNA molecule is modeled by a potential 
well, obtained by summing the specific and an unspecific potential [2111]. The 
unspecific potential accounts for the interaction between the TF and the DNA 
general structure, while the specific potential accounts for the interaction 
between the TF and the DNA bp. Restricted by the unspecific potential, 
the TF can slide and scan potential binding sites along the DNA, until it 
detaches by thermal noise. Unbound, the TF diffuses freely in the nucleus 
until it comes close enough to the DNA where it can bind again. 

To obtain an asymptotic estimate of the number of bp scanned per binding 
n we generalize the computations of [2], [3j| by using the notion of random 
potential and the solution of the mean first passage time equation [13] . Based 
on the narrow escape computations [7], we estimate the mean time Tfree the 
TF spends in the nuclear space before rebinding to a DNA molecule. Here 
the term "nucleus" refers either to the nucleus of a eucaryotic cell or to the 
entire bacteria for a procaryotic organism. The term "transcription factor" 
(TF) refers either to a transcription factor or to a DNA binding protein 
whose dynamical behavior can be modeled by the same general assumptions 
(for example the Rac protein involved in DNA repair in bacteria). 
General expression of the mean search time. We recall the general expression 
of the mean search time, Ts, required by a TF to bind to its target pL]. We 
express it as a function of tdna the time spent bound to the DNA, n the 
mean number of bp scanned during this time, Nf,p the total number of bp 
in the DNA and Tfree the time spent freely diffusing in the nucleus. By 
conditioning on the number of bindings to the DNA, the total search time is 
given by [31 H] 

— _ - Nh 

Ts ~ {TdNA + Tfree)^ (l) 

Our goal is to obtain explicit formulas for n,T£)NAiT^free as a function of the 
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geometry of nucleus, DNA distribution and other physical parameters. 
Estimate o/tdna To estimate tdna, the average time a TF stays attached 
to the DNA, we study the interaction potential with the DNA backbone. 
This potential is mainly due to the charged phosphate groups of the DNA 
backbone. We hence model it as a potential V{r) = — ^ with r the distance 
to the DNA axis. 

We account for the impenetrable condition between the TF and the DNA 
molecule by defining a reflexive boundary condition at r = Hint, the radius of 
the DNA double spiral. We consider that the TF is freed when it reaches the 
position r = Rext, which corresponds to the maximal distance that allows bp 
discrimination. In practice, we choose Rg^t = '^Rmt- The mean time a TF 
starting at position r stays confined near the DNA molecule, u{r), verifies 

mm- 

W 1 
AM(r) - -^^^(^) = ^^nt <r < Rext (2) 

du 

u{Rext) = and ^(-^mt) = 

where n is the normal vector to the reflexive boundary. We solve this equa- 
tion by direct integration and approximate the expression obtained by a 
Laplace method: 

, , 1 Rj„ti Rint — Rprt)^ (kBT)'^ — ^ns 

rDNA = uir)^ - ^"^^ ^ 1 g (3) 

where Ens = V{Rext) — V{Rint) is the potential depth. For Lac I and the 
parameters given in table [T]T£)7vyi ^ b.lms which is compatible with observed 
data 5ms, [9]). 

Mean number of sites scanned. A TF whose motion is restricted through 
interaction with the DNA is said to be unspecifically bound. It then moves 
along the DNA driven by the unspecific potential. We first estimate the 
average number of bp visited for a constant specific potential. The number 
of sites scanned during a time r, n^, is equal to: 

maxt6[o;^](a;(t)) - mint6[o;r](a;(t)) 
n(r) = , (4) 

''bp 

where l^p is the length of a bp and for a TF whose position on the DNA is 
x(t) with x(0)=0. When the DNA molecule is approximate by an infinite 
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hq = n{T)e ^DNAd = 2 ; . (6) 



line, the distribution of the max (and -min) is given by 

P f max(x(t)) < xo) = erfi^=) (5) 

where erf{x) = 1 — ^ e~^^dt. This distribution for the max of x during 
a time r then allows us to compute the mean value of the max for a given 
time r. The time spent unspecifically bound is exponentially distributed for 
a potential well deep before ksT [13] and the mean time, tdnAi is given in 
formula ([3]). Thus the TF scans an average number of bp given by 

/ T \ _ ^ ^/Dtd na 
lo \tdnaJ hp 

This leads to a 40% increase compared to the mean square displacement 
formula. 

Non constant specific potential. We consider a more realistic model in which 
we estimate the probabilities and mean time required to move one bp. We 
take into account the local interaction between the TF and the DNA bp. 
Although such an approach was considered in [5], our new estimate for the 
number of bp visited n differs by a factor two compared to [1]. 
Number of bp scanned. The TF can move one bp to the right (resp. to the 
left) with a probability pi (resp. g^) when bound to the DNA molecule. We let 
Wi = —. Following the theory of random walk in a random one dimensional 
potential [U], the average number of steps Sq^n needed by a TF to go from 
position to N for the first time, is given by: 

N~l N-1 i 

So,N = N + J2^k + Yl + ^'^) n (7) 

A:=0 k=0 i=k+l j=k+l 

If u denotes the average time needed by the TF to move one bp, then the 
mean square displacement during time a r expressed in bp, Nr, is solution 
of 

T = uSo^N^ (8) 

Jump probabilities. The probability pi that a TF at position x{i), on bp i, 
moves to the right, satisfies [11] 



- = (9) 

dx^ ksT dx dx 

p{x{i — 1)) = and p{x{i + 1)) = 1. 
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For a piecewise constant potential V, equal to Ei near bp i, we solve equation 
(ED and: 

E(i-l) E(i) 

_Pi _ Pi _ e "bt + e'^BT 

^« ~ „ ~ 1 „ ~ E(i+1) E(i) ■ l-'-Uj 



Qi 1 - Pi 



E(i+1) E(i) 
kgT _j_ gfcgl 



Average time required to move one bp. To evaluate expression ([8]), we esti- 
mate the mean time u required by a TF to move one bp. It is the solution 
of Dynkin's equation given in [3] with the absorbing conditions u{x{i — 1)) = 
u{x{i + 1)) = We explicitly solve this equation and obtain the average time 
Ui = u{x{i)) to move one step to the left or to the right and for a piecewise 
potential: 




^^ = 7^\^+n^^. m-^. I (11) 



where Ibp is the average length of a bp. 

Number of potential binding sites scanned We denote by i the position of the 
TF's beginning n the number of bp interacting with the TF. The position 
weight matrix model [161 [13 has already been shown to be equivalent to a 
normal distribution of Ei, the specific energy of a given site i ([!])• In addition 
the specific energies for sites starting at positions i and j can be correlated. 
For |« — jI > n the specific energies are independent. For |« — < n there 
are n — |z — j| bp contributing to both energies that induce a correlation 
between the energies for the sites i and j. One can further show by taking 
linear combinations aEi + that {Ei, Ei+i) follows a bivariate normal 

law. 

We can then estimate expression ([7j) by neglecting the two terms of order 
in front of the term of order A^^. With Xr = Nrhp the mean square 
displacement: 



/_ 9\ / + E(j) \ 
/ ^ \ I G ~n 6 \ 

So,N^ ~ I J lE(£;^)^^g^ eW (12) 

\''bp / \yQ kT -\- e kT J 

for couples such that \ j — i \> n. We can then average over the different 
energy levels and find an estimate with a Laplace method. Similarly we 
estimate u by averaging ( ITTl) over the energy levels. We then obtain Xr with 
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equation ([H]). By considering the mean square displacement is proportional 
to the average number of bp scanned and with formula 1^, we obtain n the 
mean number of bp visited during a typical one dimensional walk, 



n 



\ 



_£2{l + p) 



A(kT^-i /l I -^^(l-P) 



2{fcT) 



3<t2(1-p) 
2 I 1 I 3e^{fcTp" 



(13) 



with a = ^^{Ef) the variance and p = ^^(-^'-^'+1) the correlation factor. In 
figure [H we show how n depends on a and p. For large a, we approximate 
by: 



where TTq is given in equation ([6]). We find n = 75 sites visited during a 
typical one dimensional walk of 5ms with the data in table [TJ This can be 
compared with the experimental value of ~ 85 [9]. 

Free diffusion time. We now estimate Tfree, the mean time a TF freely 
diffuses in the nucleus between two consecutive DNA bindings. As stated at 
the end of the introduction the term "nucleus" refers either to the nucleus 
of an eucaryotic organism (modeled as a sphere of radius R) or the entire 
bacteria for a procaryotic organism (modeled as a cylinder of radius R for E 
Coli). We consider the DNA is organized (Fig. [2]) on a square lattice of Ngt 



parallel cylindrical strands of diameter 2e = 2R^^j^ ^ 30nm where Pdna 

is the ratio of absorbing DNA volume to the total nuclear volume. These 
strands account for the DNA structure below the 30nm fibers. We consider 
here such organization, which has been observed in some bacteria after cell 
irradiation [6]. 

For parallel DNA strands, we can, by symmetry, consider only a single 
two dimensional square (Fig. [2]). The TF is absorbed at the external radius 
e, and is considered to be reflected on the square boundary as it enters a 
symmetrical and identical square. In cylindrical coordinates the mean time 
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n, number of bp scanned 




12 3 -0,8 -0,4 0,4 0,8 

s = ^jTp correlation factor p 



Figure 1: Left: Mean number of sites visited as a function of s = -r^ for 
no correlation (p = 0) expressed in multiples of Uq. Right: Number of sites 
visited for a = k^T as a function of the correlation factor p in multiples of 
the value for p = 0. A positive correlation p > 0, is associated with a lesser 
apparent roughness of the specific potential and to a more sites scanned. 
With p < 0, low energy sites have a tendency of being flanked by high 
energy sites which leads to a greater number of local minima of the specific 
potential and to less sites visited. 




Figure 2: Left: Schematic of a two dimensional section of the nucleus per- 
pendicular to the DNA organized on a square lattice. Each DNA strand is a 
compacted in a 30 nm fiber. We approximate the nucleus by a collection of 

2 

boxes. Right: Free diffusion time Tfree in multiples of ^ plotted as a function 
of Pdna the DNA density. High DNA density leads to a faster DNA search. 
^ 0.1ms. for the parameters of table [1] 
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before absorbtion, u{r,9), when starting at position {r,9) verifies 
1 d fdu\ ^ 1 d^u 1 



r dr \ dr J 89^ D 

a^. & , (14) 

-—- = on the square of size \ / — — 
^ u{r, 6*) = for r = e, for 9 G [0, 2%], 

where n is the normal vector to the square boundary. The solution can be 
expressed in the form 



u 



oo 

uo + A\n (^^) + J2^An (^^)" + Bn ") cos(n6 



n=l 



where A, An and Bn are constants to be determined and uq = — ^^^ ■ The 
absorbing boundary condition at r = e requires An = —Bn- Moreover, by 
symmetry, only A, A^^^ and i?4„ are non null, r/^ee is the average of u over 
a uniform initial distribution. We need to estimate the coefficient A and the 
other remaining terms since they have a contribution due to the effect of the 
corners. To find the coefficients, we use the reflective boundary condition. 
We let: 

5o = (15) 



^DpnNA f vr ^ 



Bn = AnA^n ' ' , forn > (16) 



JDNA 



by neglecting (^) in front of (^ and for 9 G [0; |]: 

0^^^ + fl„tanW + f:fl„ -'""tr.f (17) 

cos^ 9 ^-^ cos*"+^ 9 

n=\ 

By expanding in variable ^ = tan^, we obtain a power series and identify 
the terms of same degree. We can then numerically solve the infinite system 
of algebraic equations by truncating at a certain rank. Finally, by reporting 
into the expression of u and after averaging over a uniform initial position: 

free "pj 

— (0.3 Hpdna) - 0.41 + 0.55pdna) (18) 
J-'Pdna 
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Table 1: Numerical parameters and results for for Lad 



D 


Lad diffusion constant 




m 




Number of bp in E Coli 


4.8* 10^ 






Average length of a bp 


0.34nm 




R 


Radius of E Coli 


OAfim 


CCDB database 


L 


Length of E Coli 


2fim 


CCDB database 


Ens 


Non specific energy 


-16kT 


[10] 


a 


Spec, energy roughness 


2kT 


Regtrans base 


P 


Correlation factor 


+2% 


Regtrans base 


2e 


Diameter of DNA fiber 


30nm 




Rint 


DNA double helix radius 


Inm 




Rext 


Potential external radius 


2nm 




free 


Average time spent freely diffusing 


1.5 ms 



T^DNA Average time spent unspecifically bound 5.7 ms 

n Average number of bp scanned 75 

Ts Average time needed to find the target site 7min40 



where Pdna is the the ratio of the absorbing DNA to the total nuclear volume. 
In figure [21 we plot Tfree as a function of Pdna, the ratio of the absorbing 
DNA to the total nuclear volume. 

It is interesting to note that, when multiplying Nhp and Pdna by a fac- 
tor k (this increases the DNA density by a factor k and keeps the nucleus 
volume constant), the global search time given in[T]is multiplied by a factor 
strictly smaller than k while searching through k times more information. We 
conclude that a higher DNA density leads to a more efficient search process. 

Using formula f|T3l) .fl3|l and f|T8|) and the data given for E. Coli in table 
[H we obtain Tfree = l-5ms, tdna = 5.7ms, n = 75 and an average search 
time of Ts = 7minA8s, which is compatible with observed data |9]. Our 
conclusions do not rely on the assumption that Tfree = 'tdna as we obtain 
two independent expressions for Tfree and Tdna- Moreover, we find that a 
TF stays bound to the DNA molecule for roughly 80 % of the total search 
time. This agrees with the experimental data published in [9J, where the 
TF is bound around 87 % of the time to the DNA molecule. It would 
be an interesting problem to extend our method to a more general DNA 
distribution. 
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